Purpose

The purpose of this document is to illustrate time series analysis and forecasting. We will use a simulated dataset to analyze things like visits, discharges and payments. To perform these analyses we will be following the modeltime workflow. This report will be broken down into sections that follow that same workflow.

Data View

Lets take a look at our data and see what it has.

df_tbl %>%
  glimpse()
## Rows: 160,649
## Columns: 12
## $ mrn                     <chr> "83537791", "66727914", "84487881", "19795240"~
## $ visit_id                <chr> "8762718156", "1283065398", "1171004549", "133~
## $ visit_start_date_time   <dttm> 2012-01-01 06:14:00, 2011-12-26 01:14:00, 201~
## $ visit_end_date_time     <dttm> 2012-01-01 00:00:00, 2012-01-01 12:06:00, 201~
## $ total_charge_amount     <dbl> 4379.00, 62580.61, 38466.48, 16561.16, 16725.3~
## $ total_adjustment_amount <dbl> -4071.41, -39117.58, -26930.67, -11211.26, -11~
## $ total_payment_amount    <dbl> -307.59, -23463.03, -11535.81, -5349.90, -5273~
## $ payer_grouping          <chr> "Medicaid", "Commercial", "Blue Cross", "Medic~
## $ service_line            <chr> "General Outpatient", "Medical", "Surgical", "~
## $ ip_op_flag              <chr> "O", "I", "I", "I", "I", "I", "I", "I", "I", "~
## $ adm_date                <date> 2012-01-01, 2011-12-26, 2011-12-31, 2011-12-2~
## $ dsch_date               <date> 2012-01-01, 2012-01-01, 2012-01-01, 2012-01-0~
skim(df_tbl)
Data summary
Name df_tbl
Number of rows 160649
Number of columns 12
_______________________
Column type frequency:
character 5
Date 2
numeric 3
POSIXct 2
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
mrn 0 1 8 8 0 75614 0
visit_id 0 1 10 10 0 160637 0
payer_grouping 0 1 1 14 0 13 0
service_line 0 1 2 44 0 29 0
ip_op_flag 0 1 1 1 0 2 0

Variable type: Date

skim_variable n_missing complete_rate min max median n_unique
adm_date 0 1 2010-11-26 2019-12-31 2015-08-24 2956
dsch_date 0 1 2012-01-01 2019-12-31 2015-08-28 2922

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
total_charge_amount 0 1.00 46082.61 63709.76 0.13 12960.31 29371.79 55058.56 3686443.50 ▇▁▁▁▁
total_adjustment_amount 0 1.00 -39892.10 59053.62 -3059344.26 -48460.49 -23357.11 -9586.85 87590.13 ▁▁▁▁▇
total_payment_amount 3299 0.98 -8965.33 12691.35 -627099.24 -10904.82 -6125.13 -2779.48 1629.69 ▁▁▁▁▇

Variable type: POSIXct

skim_variable n_missing complete_rate min max median n_unique
visit_start_date_time 0 1 2010-11-26 05:00:00 2019-12-31 05:00:00 2015-08-24 10:02:00 118060
visit_end_date_time 0 1 2012-01-01 00:00:00 2019-12-31 22:58:00 2015-08-28 00:00:00 103820

Preparing Data

Our objectives are to:

Our forecasting will focus on a grouped forecast where we are going to forecast the number of discharges by inpatient/outpatient visit type and by payer grouping.

We are going to do this on a weekly scale.

Aggregate discharges by IP/OP and Payer Grouping by Month

  1. Start with df_tbl
  2. Use summarise_by_time() with .by = "month", and n() the visits.
  3. Save as a new variable called transactions_monthly_tbl
transactions_monthly_tbl <- df_tbl %>%
  filter(payer_grouping != "?") %>%
  group_by(ip_op_flag, payer_grouping) %>%
  summarise_by_time(
    .date_var = dsch_date
    , .by     = "month"
    , value   = n()
  ) %>%
  ungroup()

transactions_monthly_tbl
## # A tibble: 2,137 x 4
##    ip_op_flag payer_grouping dsch_date  value
##    <chr>      <chr>          <date>     <int>
##  1 I          Blue Cross     2012-01-01   153
##  2 I          Blue Cross     2012-02-01   156
##  3 I          Blue Cross     2012-03-01   145
##  4 I          Blue Cross     2012-04-01   178
##  5 I          Blue Cross     2012-05-01   188
##  6 I          Blue Cross     2012-06-01   155
##  7 I          Blue Cross     2012-07-01   175
##  8 I          Blue Cross     2012-08-01   183
##  9 I          Blue Cross     2012-09-01   145
## 10 I          Blue Cross     2012-10-01   133
## # ... with 2,127 more rows
transactions_monthly_tbl %>%
  plot_time_series(
    .date_var     = dsch_date
    , .color_var  = ip_op_flag
    , .facet_vars = payer_grouping
    , .facet_ncol = 4
    , .value      = value
    , .smooth     = FALSE
  )